## ListingKey ListingNumber
## 17A93590655669644DB4C06: 6 Min. : 4
## 349D3587495831350F0F648: 4 1st Qu.: 400919
## 47C1359638497431975670B: 4 Median : 600554
## 8474358854651984137201C: 4 Mean : 627886
## DE8535960513435199406CE: 4 3rd Qu.: 892634
## 04C13599434217079754AEE: 3 Max. :1255725
## (Other) :113912
## ListingCreationDate CreditGrade Term
## 2013-10-02 17:20:16.550000000: 6 :84984 Min. :12.00
## 2013-08-28 20:31:41.107000000: 4 C : 5649 1st Qu.:36.00
## 2013-09-08 09:27:44.853000000: 4 D : 5153 Median :36.00
## 2013-12-06 05:43:13.830000000: 4 B : 4389 Mean :40.83
## 2013-12-06 11:44:58.283000000: 4 AA : 3509 3rd Qu.:36.00
## 2013-08-21 07:25:22.360000000: 3 HR : 3508 Max. :60.00
## (Other) :113912 (Other): 6745
## LoanStatus ClosedDate
## Current :56576 :58848
## Completed :38074 2014-03-04 00:00:00: 105
## Chargedoff :11992 2014-02-19 00:00:00: 100
## Defaulted : 5018 2014-02-11 00:00:00: 92
## Past Due (1-15 days) : 806 2012-10-30 00:00:00: 81
## Past Due (31-60 days): 363 2013-02-26 00:00:00: 78
## (Other) : 1108 (Other) :54633
## BorrowerAPR BorrowerRate LenderYield
## Min. :0.00653 Min. :0.0000 Min. :-0.0100
## 1st Qu.:0.15629 1st Qu.:0.1340 1st Qu.: 0.1242
## Median :0.20976 Median :0.1840 Median : 0.1730
## Mean :0.21883 Mean :0.1928 Mean : 0.1827
## 3rd Qu.:0.28381 3rd Qu.:0.2500 3rd Qu.: 0.2400
## Max. :0.51229 Max. :0.4975 Max. : 0.4925
## NA's :25
## EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## Min. :-0.183 Min. :0.005 Min. :-0.183
## 1st Qu.: 0.116 1st Qu.:0.042 1st Qu.: 0.074
## Median : 0.162 Median :0.072 Median : 0.092
## Mean : 0.169 Mean :0.080 Mean : 0.096
## 3rd Qu.: 0.224 3rd Qu.:0.112 3rd Qu.: 0.117
## Max. : 0.320 Max. :0.366 Max. : 0.284
## NA's :29084 NA's :29084 NA's :29084
## ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## Min. :1.000 :29084 Min. : 1.00
## 1st Qu.:3.000 C :18345 1st Qu.: 4.00
## Median :4.000 B :15581 Median : 6.00
## Mean :4.072 A :14551 Mean : 5.95
## 3rd Qu.:5.000 D :14274 3rd Qu.: 8.00
## Max. :7.000 E : 9795 Max. :11.00
## NA's :29084 (Other):12307 NA's :29084
## ListingCategory..numeric. BorrowerState
## Min. : 0.000 CA :14717
## 1st Qu.: 1.000 TX : 6842
## Median : 1.000 NY : 6729
## Mean : 2.774 FL : 6720
## 3rd Qu.: 3.000 IL : 5921
## Max. :20.000 : 5515
## (Other):67493
## Occupation EmploymentStatus
## Other :28617 Employed :67322
## Professional :13628 Full-time :26355
## Computer Programmer : 4478 Self-employed: 6134
## Executive : 4311 Not available: 5347
## Teacher : 3759 Other : 3806
## Administrative Assistant: 3688 : 2255
## (Other) :55456 (Other) : 2718
## EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## Min. : 0.00 False:56459 False:101218
## 1st Qu.: 26.00 True :57478 True : 12719
## Median : 67.00
## Mean : 96.07
## 3rd Qu.:137.00
## Max. :755.00
## NA's :7625
## GroupKey DateCreditPulled
## :100596 2013-12-23 09:38:12: 6
## 783C3371218786870A73D20: 1140 2013-11-21 09:09:41: 4
## 3D4D3366260257624AB272D: 916 2013-12-06 05:43:16: 4
## 6A3B336601725506917317E: 698 2014-01-14 20:17:49: 4
## FEF83377364176536637E50: 611 2014-02-09 12:14:41: 4
## C9643379247860156A00EC0: 342 2013-09-27 22:04:54: 3
## (Other) : 9634 (Other) :113912
## CreditScoreRangeLower CreditScoreRangeUpper
## Min. : 0.0 Min. : 19.0
## 1st Qu.:660.0 1st Qu.:679.0
## Median :680.0 Median :699.0
## Mean :685.6 Mean :704.6
## 3rd Qu.:720.0 3rd Qu.:739.0
## Max. :880.0 Max. :899.0
## NA's :591 NA's :591
## FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
## : 697 Min. : 0.00 Min. : 0.00
## 1993-12-01 00:00:00: 185 1st Qu.: 7.00 1st Qu.: 6.00
## 1994-11-01 00:00:00: 178 Median :10.00 Median : 9.00
## 1995-11-01 00:00:00: 168 Mean :10.32 Mean : 9.26
## 1990-04-01 00:00:00: 161 3rd Qu.:13.00 3rd Qu.:12.00
## 1995-03-01 00:00:00: 159 Max. :59.00 Max. :54.00
## (Other) :112389 NA's :7604 NA's :7604
## TotalCreditLinespast7years OpenRevolvingAccounts
## Min. : 2.00 Min. : 0.00
## 1st Qu.: 17.00 1st Qu.: 4.00
## Median : 25.00 Median : 6.00
## Mean : 26.75 Mean : 6.97
## 3rd Qu.: 35.00 3rd Qu.: 9.00
## Max. :136.00 Max. :51.00
## NA's :697
## OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## Min. : 0.0 Min. : 0.000 Min. : 0.000
## 1st Qu.: 114.0 1st Qu.: 0.000 1st Qu.: 2.000
## Median : 271.0 Median : 1.000 Median : 4.000
## Mean : 398.3 Mean : 1.435 Mean : 5.584
## 3rd Qu.: 525.0 3rd Qu.: 2.000 3rd Qu.: 7.000
## Max. :14985.0 Max. :105.000 Max. :379.000
## NA's :697 NA's :1159
## CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## Min. : 0.0000 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.0 1st Qu.: 0.000
## Median : 0.0000 Median : 0.0 Median : 0.000
## Mean : 0.5921 Mean : 984.5 Mean : 4.155
## 3rd Qu.: 0.0000 3rd Qu.: 0.0 3rd Qu.: 3.000
## Max. :83.0000 Max. :463881.0 Max. :99.000
## NA's :697 NA's :7622 NA's :990
## PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
## Min. : 0.0000 Min. : 0.000 Min. : 0
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 3121
## Median : 0.0000 Median : 0.000 Median : 8549
## Mean : 0.3126 Mean : 0.015 Mean : 17599
## 3rd Qu.: 0.0000 3rd Qu.: 0.000 3rd Qu.: 19521
## Max. :38.0000 Max. :20.000 Max. :1435667
## NA's :697 NA's :7604 NA's :7604
## BankcardUtilization AvailableBankcardCredit TotalTrades
## Min. :0.000 Min. : 0 Min. : 0.00
## 1st Qu.:0.310 1st Qu.: 880 1st Qu.: 15.00
## Median :0.600 Median : 4100 Median : 22.00
## Mean :0.561 Mean : 11210 Mean : 23.23
## 3rd Qu.:0.840 3rd Qu.: 13180 3rd Qu.: 30.00
## Max. :5.950 Max. :646285 Max. :126.00
## NA's :7604 NA's :7544 NA's :7544
## TradesNeverDelinquent..percentage. TradesOpenedLast6Months
## Min. :0.000 Min. : 0.000
## 1st Qu.:0.820 1st Qu.: 0.000
## Median :0.940 Median : 0.000
## Mean :0.886 Mean : 0.802
## 3rd Qu.:1.000 3rd Qu.: 1.000
## Max. :1.000 Max. :20.000
## NA's :7544 NA's :7544
## DebtToIncomeRatio IncomeRange IncomeVerifiable
## Min. : 0.000 $25,000-49,999:32192 False: 8669
## 1st Qu.: 0.140 $50,000-74,999:31050 True :105268
## Median : 0.220 $100,000+ :17337
## Mean : 0.276 $75,000-99,999:16916
## 3rd Qu.: 0.320 Not displayed : 7741
## Max. :10.010 $1-24,999 : 7274
## NA's :8554 (Other) : 1427
## StatedMonthlyIncome LoanKey TotalProsperLoans
## Min. : 0 CB1B37030986463208432A1: 6 Min. :0.00
## 1st Qu.: 3200 2DEE3698211017519D7333F: 4 1st Qu.:1.00
## Median : 4667 9F4B37043517554537C364C: 4 Median :1.00
## Mean : 5608 D895370150591392337ED6D: 4 Mean :1.42
## 3rd Qu.: 6825 E6FB37073953690388BC56D: 4 3rd Qu.:2.00
## Max. :1750003 0D8F37036734373301ED419: 3 Max. :8.00
## (Other) :113912 NA's :91852
## TotalProsperPaymentsBilled OnTimeProsperPayments
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 9.00 1st Qu.: 9.00
## Median : 16.00 Median : 15.00
## Mean : 22.93 Mean : 22.27
## 3rd Qu.: 33.00 3rd Qu.: 32.00
## Max. :141.00 Max. :141.00
## NA's :91852 NA's :91852
## ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 0.00
## Mean : 0.61 Mean : 0.05
## 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :42.00 Max. :21.00
## NA's :91852 NA's :91852
## ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## Min. : 0 Min. : 0
## 1st Qu.: 3500 1st Qu.: 0
## Median : 6000 Median : 1627
## Mean : 8472 Mean : 2930
## 3rd Qu.:11000 3rd Qu.: 4127
## Max. :72499 Max. :23451
## NA's :91852 NA's :91852
## ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## Min. :-209.00 Min. : 0.0
## 1st Qu.: -35.00 1st Qu.: 0.0
## Median : -3.00 Median : 0.0
## Mean : -3.22 Mean : 152.8
## 3rd Qu.: 25.00 3rd Qu.: 0.0
## Max. : 286.00 Max. :2704.0
## NA's :95009
## LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## Min. : 0.00 Min. : 0.0 Min. : 1
## 1st Qu.: 9.00 1st Qu.: 6.0 1st Qu.: 37332
## Median :14.00 Median : 21.0 Median : 68599
## Mean :16.27 Mean : 31.9 Mean : 69444
## 3rd Qu.:22.00 3rd Qu.: 65.0 3rd Qu.:101901
## Max. :44.00 Max. :100.0 Max. :136486
## NA's :96985
## LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## Min. : 1000 2014-01-22 00:00:00: 491 Q4 2013:14450
## 1st Qu.: 4000 2013-11-13 00:00:00: 490 Q1 2014:12172
## Median : 6500 2014-02-19 00:00:00: 439 Q3 2013: 9180
## Mean : 8337 2013-10-16 00:00:00: 434 Q2 2013: 7099
## 3rd Qu.:12000 2014-01-28 00:00:00: 339 Q3 2012: 5632
## Max. :35000 2013-09-24 00:00:00: 316 Q2 2012: 5061
## (Other) :111428 (Other):60343
## MemberKey MonthlyLoanPayment LP_CustomerPayments
## 63CA34120866140639431C9: 9 Min. : 0.0 Min. : -2.35
## 16083364744933457E57FB9: 8 1st Qu.: 131.6 1st Qu.: 1005.76
## 3A2F3380477699707C81385: 8 Median : 217.7 Median : 2583.83
## 4D9C3403302047712AD0CDD: 8 Mean : 272.5 Mean : 4183.08
## 739C338135235294782AE75: 8 3rd Qu.: 371.6 3rd Qu.: 5548.40
## 7E1733653050264822FAA3D: 8 Max. :2251.5 Max. :40702.39
## (Other) :113888
## LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## Min. : 0.0 Min. : -2.35 Min. :-664.87
## 1st Qu.: 500.9 1st Qu.: 274.87 1st Qu.: -73.18
## Median : 1587.5 Median : 700.84 Median : -34.44
## Mean : 3105.5 Mean : 1077.54 Mean : -54.73
## 3rd Qu.: 4000.0 3rd Qu.: 1458.54 3rd Qu.: -13.92
## Max. :35000.0 Max. :15617.03 Max. : 32.06
##
## LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## Min. :-9274.75 Min. : -94.2 Min. : -954.5
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0.0
## Median : 0.00 Median : 0.0 Median : 0.0
## Mean : -14.24 Mean : 700.4 Mean : 681.4
## 3rd Qu.: 0.00 3rd Qu.: 0.0 3rd Qu.: 0.0
## Max. : 0.00 Max. :25000.0 Max. :25000.0
##
## LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## Min. : 0.00 Min. :0.7000 Min. : 0.00000
## 1st Qu.: 0.00 1st Qu.:1.0000 1st Qu.: 0.00000
## Median : 0.00 Median :1.0000 Median : 0.00000
## Mean : 25.14 Mean :0.9986 Mean : 0.04803
## 3rd Qu.: 0.00 3rd Qu.:1.0000 3rd Qu.: 0.00000
## Max. :21117.90 Max. :1.0125 Max. :39.00000
##
## InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## Min. : 0.00000 Min. : 0.00 Min. : 1.00
## 1st Qu.: 0.00000 1st Qu.: 0.00 1st Qu.: 2.00
## Median : 0.00000 Median : 0.00 Median : 44.00
## Mean : 0.02346 Mean : 16.55 Mean : 80.48
## 3rd Qu.: 0.00000 3rd Qu.: 0.00 3rd Qu.: 115.00
## Max. :33.00000 Max. :25000.00 Max. :1189.00
##
The Prosper Loan Data dataset contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.
## A AA B C D E HR NC
## 84984 3315 3509 4389 5649 5153 3289 3508 141
The Credit rating that was assigned at the time the listing went live. Applicable for listings pre-2009 period and will only be populated for those listings. The y-axis is presented on a logarithmic scale.
## Cancelled Chargedoff Completed
## 5 11992 38074
## Current Defaulted FinalPaymentInProgress
## 56576 5018 205
## Past Due (>120 days) Past Due (1-15 days) Past Due (16-30 days)
## 16 806 265
## Past Due (31-60 days) Past Due (61-90 days) Past Due (91-120 days)
## 363 313 304
The current status of the loan: Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue. The PastDue status will be accompanied by a delinquency bucket. The y-axis is presented on a logarithmic scale.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229 25
The Borrower’s Annual Percentage Rate (APR) for the loan. 25 NA’s have been removed in presentation of the data.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1340 0.1840 0.1928 0.2500 0.4975
The Borrower’s interest rate for this loan.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 3.000 4.000 4.072 5.000 7.000 29084
The Prosper Rating assigned at the time the listing was created: 0 - N/A, 1 - HR, 2 - E, 3 - D, 4 - C, 5 - B, 6 - A, 7 - AA. Applicable for loans originated after July 2009. 29084 NA’s have been removed in presentation of the data.
## A AA B C D E HR
## 29084 14551 5372 15581 18345 14274 9795 6935
The Prosper Rating assigned at the time the listing was created between AA - HR. Applicable for loans originated after July 2009.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 4.00 6.00 5.95 8.00 11.00 29084
A custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score. Applicable for loans originated after July 2009. 29084 NA’s have been removed in presentation of the data.
## Accountant/CPA
## 3588 3233
## Administrative Assistant Analyst
## 3688 3602
## Architect Attorney
## 213 1046
## Biologist Bus Driver
## 125 316
## Car Dealer Chemist
## 180 145
## Civil Service Clergy
## 1457 196
## Clerical Computer Programmer
## 3164 4478
## Construction Dentist
## 1790 68
## Doctor Engineer - Chemical
## 494 225
## Engineer - Electrical Engineer - Mechanical
## 1125 1406
## Executive Fireman
## 4311 422
## Flight Attendant Food Service
## 123 1123
## Food Service Management Homemaker
## 1239 120
## Investor Judge
## 214 22
## Laborer Landscaping
## 1595 236
## Medical Technician Military Enlisted
## 1117 1272
## Military Officer Nurse (LPN)
## 346 492
## Nurse (RN) Nurse's Aide
## 2489 491
## Other Pharmacist
## 28617 257
## Pilot - Private/Commercial Police Officer/Correction Officer
## 199 1578
## Postal Service Principal
## 627 312
## Professional Professor
## 13628 557
## Psychologist Realtor
## 145 543
## Religious Retail Management
## 124 2602
## Sales - Commission Sales - Retail
## 3446 2797
## Scientist Skilled Labor
## 372 2746
## Social Worker Student - College Freshman
## 741 41
## Student - College Graduate Student Student - College Junior
## 245 112
## Student - College Senior Student - College Sophomore
## 188 69
## Student - Community College Student - Technical School
## 28 16
## Teacher Teacher's Aide
## 3759 276
## Tradesman - Carpenter Tradesman - Electrician
## 120 477
## Tradesman - Mechanic Tradesman - Plumber
## 951 102
## Truck Driver Waiter/Waitress
## 1675 436
The Occupation selected by the Borrower at the time they created the listing. The y-axis is presented on a logarithmic scale.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 26.00 67.00 96.07 137.00 755.00 7625
The length in months of the employment status at the time the listing was created. The x-axis’s upper range has been limited to 600 removing max outlier at 755.00. 7625 NA’s have been removed in presentation of the data.
## False True
## 56459 57478
A Borrower will be classified as a homeowner if they have a mortgage on their credit profile or provide documentation confirming they are a homeowner.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 660.0 680.0 685.6 720.0 880.0 591
The lower value representing the range of the borrower’s credit score as provided by a consumer credit rating agency. The x-axis’s upper and lower range has been limited to 450 to 900 removing min outlier at 0.0. 591 NA’s have been removed in presentation of the data.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 19.0 679.0 699.0 704.6 739.0 899.0 591
The upper value representing the range of the borrower’s credit score as provided by a consumer credit rating agency. The x-axis’s upper and lower range has been limited to 450 to 900 removing min outlier at 19.0. 591 NA’s have been removed in presentation of the data.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.0000 0.0000 0.5921 0.0000 83.0000 697
Number of accounts delinquent at the time the credit profile was pulled. The x-axis’s upper range has been limited to 20 removing max outlier at 83.0000. 697 NA’s have been removed in presentation of the data.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 0.0 0.0 984.5 0.0 463881.0 7622
Dollars delinquent at the time the credit profile was pulled. The x-axis’s upper range has been limited to 50000 removing max outlier at 463881.0. 7622 NA’s have been removed in presentation of the data.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.310 0.600 0.561 0.840 5.950 7604
The percentage of available revolving credit that is utilized at the time the credit profile was pulled. The x-axis’s upper range has been limited to 2.0 removing max outlier at 5.950. 7604 NA’s have been removed in presentation of the data.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
The debt to income ratio of the borrower at the time the credit profile was pulled. This value is Null if the debt to income ratio is not available. This value is capped at 10.01 (any debt to income ratio larger than 1000% will be returned as 1001%). The x-axis’s upper range has been limited to 1.5 removing max outlier at 10.010. 8554 NA’s have been removed in presentation of the data.
## $0 $1-24,999 $100,000+ $25,000-49,999 $50,000-74,999
## 621 7274 17337 32192 31050
## $75,000-99,999 Not displayed Not employed
## 16916 7741 806
The income range of the borrower at the time the listing was created.
## False True
## 8669 105268
The borrower indicated they have the required documentation to support their income.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3200 4667 5608 6825 1750003
The monthly income the borrower stated at the time the listing was created. The x-axis is presented on a square root scale and has been limited to 50000 removing max outlier at 1750003.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
The origination amount of the loan. The y-axis is presented on a logarithmic scale, the x-axis is presented on a square root scale.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 131.6 217.7 272.5 371.6 2251.5
The scheduled monthly loan payment. The x-axis’s upper range has been limited to 1500 removing max outlier at 2251.5.
This dataset contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.
The main feature of interest will be ProsperScore, a score of risk on a scale of 1 to 10; how it is likely determined, and what factors play a hand in influencing it.
Features indicating the customer’s ability to take on debt, such as Occupation, CreditScoreRange, DebtToIncomeRatio, etc.
No new variables have been created in the investigation of this dataset, 81 are more than enough!
In general all features were investigated utilizing histograms of various bin widths and sizes, some scaling of the x-axis using scale_x_sqrt() and some scaling of the y-axis using scale_y_log10(). x.element_text was used to facilitate the reading of tags and coord_cartesian() and x_lim was used to limit the x-axis. Otherwise, geom_bar() was used to indicate True or False as with IncomeVerifiable or IsBorrowerHomeowner.
## Employed Full-time Not available Not employed
## 2255 67322 26355 5347 835
## Other Part-time Retired Self-employed
## 3806 1088 795 6134
Bivariate boxplot plotting ProsperScore on the y-axis as a function of EmploymentStatus. Curiously, Self-employed garners a lower ProsperScore than Not employed as evidenced by a comparison of each of the medians in the interquartile range of the associated boxplots.
## False True
## 56459 57478
Bivariate boxplot plotting ProsperScore as a function of IsBorrowerHomeowner, as evidenced by the nearly identical medians in both the boxplot’s interquartile range, owning a home has no appreciable effect upon one’s ProsperScore.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 660.0 680.0 685.6 720.0 880.0 591
Bivariate boxplot plotting ProsperScore as a function of CreditScoreRangeLower, as evidenced by the median in CreditScoreRangeLower boxplot’s interquartile range, CreditScoreRangeLower is positively correlated with ProsperScore. 591 NA’s have been removed in presentation of the data.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
Bivariate multi-plot of DebtToIncomeRatio as a function of ProsperScore with red point stat_summary indicating the ProsperScore mean. ProsperScore is negatively correlated with DebtToIncomeRatio.
## $0 $1-24,999 $100,000+ $25,000-49,999 $50,000-74,999
## 621 7274 17337 32192 31050
## $75,000-99,999 Not displayed Not employed
## 16916 7741 806
Bivariate boxplot plotting ProsperScore as a function of IncomeRange, within $25,000 - 74,999 IncomeRange, ProsperScore appears uniformly distributed.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3200 4667 5608 6825 1750003
Bivariate boxplot plotting ProsperScore as a function of StatedMonthlyIncome, on a logarithmic scale, approaching 1e+04 StatedMonthlyIncome, ProsperScore appears uniformly distributed.
Bivariate multi-plot of StatedMonthlyIncome as a function of ProsperScore with red point stat_summary indicating the ProsperScore mean. ProsperScore is uniformly distributed approaching 1e+04 on a logarithmic scale.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 131.6 217.7 272.5 371.6 2251.5
Bivariate boxplot plotting MonthlyLoanPayment as a function of ProsperScore, ProsperScore appears slightly positively correlated below 500.
Bivariate multi-plot of MonthlyLoanPayment as a function of ProsperScore with red point stat_summary indicating the ProsperScore mean. ProsperScore appears slightly positively correlated below 500.
Full-time EmploymentStatus seems to garner a greater ProsperScore than Not employed yet Not employed garners a greater ProsperScore than Self-employed. IsBorrowerHomeowner seems to have little bearing on ProsperScore which is uniformly distributed between True and False. CreditScoreRangeLower, especially above 800, is positively correlated with ProsperScore. A low DebtToIncomeRatio is initially negatively correlated to ProsperScore but then vacillates it rises above 2.5. IncomeRange is positively correlated with ProsperScore above 50k. In general, StatedMonthlyIncome seems uniformly distributed but with outliers of greater than 1,500,000 having lower scores than others of StatedMonthlyIncome 250k or much less. MonthlyLoanPayment is slightly positively correlated.
The most interesting relationships observed were that Not employed seems to garner a greater ProsperScore than Self-employed, which seems counter-intuitive, and also that DebtToIncomeRatio fluctuates in correlation to ProsperScore as it rises above 2.5.
IsBorrowerHomeowner seems definitely to have no bearing whatsoever on ProsperScore.
Multivariate plot of ProsperScore as a function of IsBorrowerHomeowner for Computer Programmer occupation, as evidenced by nearly identical shading across almost all color bars, IsBorrowerHomeowner appears to have no correlation to ProsperScore.
Multivariate plot of ProsperScore as a function of IsBorrowerHomeowner for Doctor occupation, as evidenced by nearly identical shading across almost all color bars, IsBorrowerHomeowner appears to have no correlation to ProsperScore.
Multivariate plot of ProsperScore as a function of IsBorrowerHomeowner for Tradesman - Plumber occupation, as evidenced by nearly identical shading across almost all color bars, Tradesman - Plumber appears to have no correlation to ProsperScore.
Multivariate plot of CreditScoreRangeLower as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, CreditScoreRangeLower appears slightly positively correlated with ProsperScore for all three occupations.
Multivariate plot of CreditScoreRangeUpper as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, CreditScoreRangeUpper appears slightly positively correlated with ProsperScore for all three occupations.
Multivariate plot of DebtToIncomeRatio as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, DebtToIncomeRatio appears somewhat negatively correlated with ProsperScore for all three occupations with many outliers below the given median.
Multivariate plot of StatedMonthlyIncome as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, StatedMonthlyIncome appears slightly positively correlated with ProsperScore for all three occupations with a few outliers above the given median.
Multivariate plot of MonthlyLoanPayment as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, MonthlyLoanPayment appears somewhat positively correlated with ProsperScore for all three occupations with some outliers above the given median.
IsBorrowerHomeowner appears evenly divided between all three occupational classes.
For all three occupations, ProsperScore seems to roughly correspond to CreditScore based on the CreditScoreRangeUpper and CreditScoreRangeLower plots.
DebtToIncomeRatio appears somewhat negatively correlated with ProsperScore for all three occupations.
StatedMonthlyIncome appears slightly positively correlated with ProsperScore for all three occupations.
MonthlyLoanPayment appears somewhat positively correlated with ProsperScore for all three occupations.
That IsBorrowerHomeowner seems evenly divided between all occupational classes is somewhat surprising as one would expect a greater percentage of Doctors for instance to own a home or that Doctor’s with a previous home not to need a loan. Perhaps for a second home?
Bivariate boxplot plotting ProsperScore on the y-axis as a function of EmploymentStatus. Curiously, Self-employed garners a lower ProsperScore than Not employed as evidenced by the lesser median in the interquartile range of the Self-employed boxplot in comparison with the greater median in the interquartile range of the Not employed boxplot.
Bivariate boxplot plotting ProsperScore as a function of IsBorrowerHomeowner, as evidenced by the nearly identical medians in both the boxplot’s interquartile range, owning a home has no appreciable effect upon one’s ProsperScore.
Multivariate plot of DebtToIncomeRatio as a function of ProsperScore for Computer Programmer, Doctor and Tradesman - Plumber occupations, DebtToIncomeRatio appears somewhat negatively correlated with ProsperScore for all three occupations with many outliers below the given median.
The Prosper Loan Data is large and incorporates many variables which makes it difficult to decide what to focus on but after much study and reflection, ProsperScore presented itself as deserving a more through examination… What is ProsperScore, and if, presumably loans are to be granted based upon it, given it’s proprietary nature, how is it determined.
Exploring the Prosper Loan Data in regards to ProsperScore met with a few difficulties due to the sprawling number of variables and their size, for instance, plotting the entire number of Occupations listed, surpassed ggplot’s ability to display and I was forced to settle upon three Occupations to grossly represent, the blue collar, white collar and professional realms so as to more simply and clearly delineate possible relationships.
Ultimately, it seems that ProsperScore is a complex calculation that cannot be easily pinned down via an analysis of three variables as no doubt each of the Prosper dataset’s 81 variables play a part in a much more complex determination of a customer’s final Score. This conclusion is bolstered by the somewhat counter-intuitive findings garnered by the dataset’s exploration and analysis in R: for instance, that someone Self-employed would receive a lower ProsperScore than someone Not employed, or that IsBorrowerHomeowner being True receives no preference in ProsperScore, or as an outlier, a Doctor’s ProsperScore can rise the higher his DebtToIncomeRatio climbs. How these variables are offset by other heretofore unexamined variables can be a focus of future work within the dataset.